
<!DOCTYPE html>

<html lang="zh">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Part C: 自适应提升法 &#8212; Datawhale</title>
    
  <link href="../_static/css/theme.css" rel="stylesheet" />
  <link href="../_static/css/index.c5995385ac14fb8791e8eb36b4908be2.css" rel="stylesheet" />

    
  <link rel="stylesheet"
    href="../_static/vendor/fontawesome/5.13.0/css/all.min.css">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">

    
      

    
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="../_static/sphinx-book-theme.5f77b4aec8189eecf79907ce328c390d.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="../_static/togglebutton.css" />
    <link rel="stylesheet" type="text/css" href="../_static/mystnb.css" />
    
  <link rel="preload" as="script" href="../_static/js/index.1c5a1a01449ed65a7b51.js">

    <script id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js"></script>
    <script src="../_static/underscore.js"></script>
    <script src="../_static/doctools.js"></script>
    <script src="../_static/togglebutton.js"></script>
    <script >var togglebuttonSelector = '.toggle, .admonition.dropdown, .tag_hide_input div.cell_input, .tag_hide-input div.cell_input, .tag_hide_output div.cell_output, .tag_hide-output div.cell_output, .tag_hide_cell.cell, .tag_hide-cell.cell';</script>
    <script src="../_static/sphinx-book-theme.12a9622fbb08dcb3a2a40b2c02b83a57.js"></script>
    <script async="async" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <script type="text/x-mathjax-config">MathJax.Hub.Config({"tex2jax": {"inlineMath": [["\\(", "\\)"]], "displayMath": [["\\[", "\\]"]], "processRefs": false, "processEnvironments": false}})</script>
    <link rel="shortcut icon" href="../_static/favicon.ico"/>
    <link rel="index" title="索引" href="../genindex.html" />
    <link rel="search" title="搜索" href="../search.html" />
    <link rel="next" title="Part D: 梯度提升树" href="04_gbdt.html" />
    <link rel="prev" title="Part B: 集成模式" href="02_ensemble.html" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <meta name="docsearch:language" content="en" />
    
  </head>
  <body data-spy="scroll" data-target="#bd-toc-nav" data-offset="80">
    
    <div class="container-fluid" id="banner"></div>

    

    <div class="container-xl">
      <div class="row">
          
<div class="col-12 col-md-3 bd-sidebar site-navigation show" id="site-navigation">
    
        <div class="navbar-brand-box">
    <a class="navbar-brand text-wrap" href="../index.html">
      
      <img src="../_static/logo.png" class="logo" alt="logo">
      
      
      <h1 class="site-logo" id="site-title">Datawhale</h1>
      
    </a>
</div><form class="bd-search d-flex align-items-center" action="../search.html" method="get">
  <i class="icon fas fa-search"></i>
  <input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form><nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
    <div class="bd-toc-item active">
        <ul class="current nav bd-sidenav">
 <li class="toctree-l1 current active has-children">
  <a class="reference internal" href="index.html">
   树模型与集成学习
  </a>
  <input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" type="checkbox"/>
  <label for="toctree-checkbox-1">
   <i class="fas fa-chevron-down">
   </i>
  </label>
  <ul class="current">
   <li class="toctree-l2">
    <a class="reference internal" href="01_tree.html">
     Part A: 决策树
    </a>
   </li>
   <li class="toctree-l2">
    <a class="reference internal" href="02_ensemble.html">
     Part B: 集成模式
    </a>
   </li>
   <li class="toctree-l2 current active">
    <a class="current reference internal" href="#">
     Part C: 自适应提升法
    </a>
   </li>
   <li class="toctree-l2">
    <a class="reference internal" href="04_gbdt.html">
     Part D: 梯度提升树
    </a>
   </li>
  </ul>
 </li>
 <li class="toctree-l1 has-children">
  <a class="reference internal" href="../02_em_sampling/index.html">
   EM算法与抽样理论
  </a>
  <input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" type="checkbox"/>
  <label for="toctree-checkbox-2">
   <i class="fas fa-chevron-down">
   </i>
  </label>
  <ul>
   <li class="toctree-l2">
    <a class="reference internal" href="../02_em_sampling/01_em.html">
     EM算法
    </a>
   </li>
   <li class="toctree-l2">
    <a class="reference internal" href="../02_em_sampling/02_sample.html">
     抽样理论
    </a>
   </li>
  </ul>
 </li>
</ul>

    </div>
</nav> <!-- To handle the deprecated key -->

<div class="navbar_extra_footer">
  Theme by the <a href="https://ebp.jupyterbook.org">Executable Book Project</a>
</div>

</div>


          


          
<main class="col py-md-3 pl-md-4 bd-content overflow-auto" role="main">
    
    <div class="topbar container-xl fixed-top">
    <div class="topbar-contents row">
        <div class="col-12 col-md-3 bd-topbar-whitespace site-navigation show"></div>
        <div class="col pl-md-4 topbar-main">
            
            <button id="navbar-toggler" class="navbar-toggler ml-0" type="button" data-toggle="collapse"
                data-toggle="tooltip" data-placement="bottom" data-target=".site-navigation" aria-controls="navbar-menu"
                aria-expanded="true" aria-label="Toggle navigation" aria-controls="site-navigation"
                title="Toggle navigation" data-toggle="tooltip" data-placement="left">
                <i class="fas fa-bars"></i>
                <i class="fas fa-arrow-left"></i>
                <i class="fas fa-arrow-up"></i>
            </button>
            
            
<div class="dropdown-buttons-trigger">
    <button id="dropdown-buttons-trigger" class="btn btn-secondary topbarbtn" aria-label="Download this page"><i
            class="fas fa-download"></i></button>

    <div class="dropdown-buttons">
        <!-- ipynb file if we had a myst markdown file -->
        
        <!-- Download raw file -->
        <a class="dropdown-buttons" href="../_sources/01_tree_ensemble/03_ada.md.txt"><button type="button"
                class="btn btn-secondary topbarbtn" title="Download source file" data-toggle="tooltip"
                data-placement="left">.md</button></a>
        <!-- Download PDF via print -->
        <button type="button" id="download-print" class="btn btn-secondary topbarbtn" title="Print to PDF"
            onClick="window.print()" data-toggle="tooltip" data-placement="left">.pdf</button>
    </div>
</div>

            <!-- Source interaction buttons -->

            <!-- Full screen (wrap in <a> to have style consistency -->

<a class="full-screen-button"><button type="button" class="btn btn-secondary topbarbtn" data-toggle="tooltip"
        data-placement="bottom" onclick="toggleFullScreen()" aria-label="Fullscreen mode"
        title="Fullscreen mode"><i
            class="fas fa-expand"></i></button></a>

            <!-- Launch buttons -->

        </div>

        <!-- Table of contents -->
        <div class="d-none d-md-block col-md-2 bd-toc show">
            
            <div class="tocsection onthispage pt-5 pb-3">
                <i class="fas fa-list"></i> Contents
            </div>
            <nav id="bd-toc-nav">
                <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#adaboost">
   1. Adaboost概述
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id1">
   2. 分类损失
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#samme">
   3. SAMME
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#samme-r">
   4. SAMME.R
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#adaboost-r2">
   5. Adaboost.R2
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id2">
   知识回顾
  </a>
 </li>
</ul>

            </nav>
        </div>
    </div>
</div>
    <div id="main-content" class="row">
        <div class="col-12 col-md-9 pl-md-3 pr-md-0">
        
              <div>
                
  <div class="section" id="part-c">
<h1>Part C: 自适应提升法<a class="headerlink" href="#part-c" title="永久链接至标题">¶</a></h1>
<div class="section" id="adaboost">
<h2>1. Adaboost概述<a class="headerlink" href="#adaboost" title="永久链接至标题">¶</a></h2>
<p>Adaboost的全称是Adaptive Boosting，其含义为自适应提升算法。其中，自适应是指Adaboost会根据本轮样本的误差结果来分配下一轮模型训练时样本在模型中的相对权重，即对错误的或偏差大的样本适度“重视”，对正确的或偏差小的样本适度“放松”，这里的“重视”和“放松”具体体现在了Adaboost的损失函数设计以及样本权重的更新策略。本课我们将介绍Adaboost处理分类和回归任务的算法原理，包括SAMME算法、SAMME.R算法和Adaboost.R2算法。</p>
</div>
<div class="section" id="id1">
<h2>2. 分类损失<a class="headerlink" href="#id1" title="永久链接至标题">¶</a></h2>
<p>对于<span class="math notranslate nohighlight">\(K\)</span>分类问题而言，当样本标签<span class="math notranslate nohighlight">\(\mathbf{y}=[y_1,...,y_K]^T\)</span>的类别<span class="math notranslate nohighlight">\(S(\mathbf{y})\)</span>为第<span class="math notranslate nohighlight">\(k\)</span>类（<span class="math notranslate nohighlight">\(k=1,...,K\)</span>）时，标签<span class="math notranslate nohighlight">\(\mathbf{y}\)</span>的第<span class="math notranslate nohighlight">\(i\)</span>个（<span class="math notranslate nohighlight">\(i=1,...,K\)</span>）元素<span class="math notranslate nohighlight">\(y_i\)</span>满足</p>
<div class="math notranslate nohighlight">
\[\begin{split}
y_i=\left\{
\begin{aligned}
&amp;1,\quad &amp;{\rm if}\ i=k\\
&amp;-\frac{1}{K-1},\quad &amp;{\rm if}\ i\neq k
\end{aligned}
\right.
\end{split}\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】假设有一个3分类问题，标签类别为第2类，模型输出的类别标签为[-0.1,-0.3,0.4]，请计算对应的指数损失。</p>
</div>
<p>设模型的输出结果为<span class="math notranslate nohighlight">\(\mathbf{f}=[f_1,...,f_K]^T\)</span>，则记损失函数为</p>
<div class="math notranslate nohighlight">
\[
L(\mathbf{y},\mathbf{f})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})
\]</div>
<p>由于对任意的向量<span class="math notranslate nohighlight">\(a\mathbb{1},a\in \mathbb{R}\)</span>有</p>
<div class="math notranslate nohighlight">
\[
L(\mathbf{y}, \mathbf{f}+a\mathbb{1})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-\frac{a\mathbf{y}^T\mathbb{1}}{K})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})=L(\mathbf{y}, \mathbf{f})
\]</div>
<p>因此为了保证<span class="math notranslate nohighlight">\(\mathbf{f}\)</span>的可估性，我们需要作出约束假设，此处选择对称约束条件</p>
<div class="math notranslate nohighlight">
\[
f_1+f_2+...+f_K=0
\]</div>
<p>当模型训练完毕后，等价于我们估计出了给定特征<span class="math notranslate nohighlight">\(\mathbf{x}\)</span>情况下<span class="math notranslate nohighlight">\(\mathbf{y}\)</span>属于各个类别的后验概率，在预测时对于一个新的<span class="math notranslate nohighlight">\(\mathbf{x}\)</span>我们总是希望<span class="math notranslate nohighlight">\(L(\mathbf{y},\mathbf{f})\)</span>越小越好，但此时<span class="math notranslate nohighlight">\(y\)</span>是随机变量，因此我们希望损失的后验期望<span class="math notranslate nohighlight">\(\mathbb{E}_{\mathbf{y}\vert\mathbf{x}}\)</span>较小。从概率角度而言，一个设计良好的分类问题损失函数应当保证模型在损失后验期望达到最小时的输出结果<span class="math notranslate nohighlight">\(k^*\)</span>是使得后验概率达到最大的类别<span class="math notranslate nohighlight">\(\mathop{\arg\max}_k P(S(\mathbf{y})=k\vert \mathbf{x})\)</span>，这个条件被称为贝叶斯最优决策条件。在本问题下，<span class="math notranslate nohighlight">\(k^* =\mathop{\arg\max}_kf_k^*(\mathbf{x})\)</span>，即选取<span class="math notranslate nohighlight">\(\mathbf{f}\)</span>中分量最大的位置对应类别输出。</p>
<p>想要验证指数损失是否满足贝叶斯最优决策条件，我们只需证明当损失后验期望达到最小时有</p>
<div class="math notranslate nohighlight">
\[\mathop{\arg\max}_kf_k^*(\mathbf{x})=\mathop{\arg\max}_k P(S(\mathbf{y})=k\vert \mathbf{x})\]</div>
<p>由于此处约束为<span class="math notranslate nohighlight">\(\mathbf{f}\)</span>分量和为<span class="math notranslate nohighlight">\(0\)</span>，因此可以通过拉格朗日乘子法求解。写出拉格朗日函数为</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
Lag(\mathbf{f}, \lambda) &amp;= \mathbb{E}_{\mathbf{y}\vert\mathbf{x}}\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}) + \lambda \sum_{i=1}^K f_i\\
&amp;= \sum_{i=1}^K \left.\exp(-\frac{1}{K}\sum_{t=1}^Ky_tf_t)\right|_{S(\mathbf{y})=i}P(S(\mathbf{y})=i\vert \mathbf{x}) + \lambda \sum_{i=1}^K f_i\\
\end{aligned}
\end{split}\]</div>
<p>当<span class="math notranslate nohighlight">\(S(\mathbf{y})=i\)</span>时，指数内的式子满足</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
-\frac{1}{K}\sum_{t=1}^Ky_tf_t &amp;=-\frac{1}{K}[ f_i+\sum_{t\neq i}(-\frac{1}{K-1})f_t]\\
&amp;=-\frac{1}{K}[f_i -\frac{-f_i}{K-1}]\\
&amp;=-\frac{f_i}{K-1}
\end{aligned}
\end{split}\]</div>
<p>此时有</p>
<div class="math notranslate nohighlight">
\[
Lag(\mathbf{f}, \lambda) = \sum_{i=1}^K\exp(-\frac{f_i}{K-1}) P(S(\mathbf{y})=i\vert \mathbf{x}) + \lambda \sum_{i=1}^K f_i
\]</div>
<p>对各变量<span class="math notranslate nohighlight">\(f_1,...,f_K,\lambda\)</span>求偏导数结果式置0后，联立可解得</p>
<div class="math notranslate nohighlight">
\[
f_k^*=(K-1)[\log P(S(\mathbf{y})=k\vert \mathbf{x})-\frac{1}{K}\sum_{i=1}^K\log P(S(\mathbf{y})=i\vert \mathbf{x})],k=1,...,K
\]</div>
<p>此时有<span class="math notranslate nohighlight">\(\mathop{\arg\max}_kf_k^*(\mathbf{x})=\mathop{\arg\max}_k P(S(\mathbf{y})=k\vert \mathbf{x})\)</span>，故选择指数损失能够满足贝叶斯最优决策条件。</p>
</div>
<div class="section" id="samme">
<h2>3. SAMME<a class="headerlink" href="#samme" title="永久链接至标题">¶</a></h2>
<p>SAMME算法的全称是<span class="math notranslate nohighlight">\(\rm{\textbf{S}tagewise\,\textbf{A}dditive\,\textbf{M}odeling\, using\, a\,\textbf{M}ulticlass\,\textbf{E}xponential\, loss\, function}\)</span>，它假定模型的总输出<span class="math notranslate nohighlight">\(\mathbf{f}\)</span>具有<span class="math notranslate nohighlight">\(\mathbf{f}^{(M)}(\mathbf{x})=\sum_{m=1}^M \beta^{(m)} \mathbf{b}^{(m)}(\mathbf{x})\)</span>的形式。其中，<span class="math notranslate nohighlight">\(M\)</span>是模型的总迭代轮数，<span class="math notranslate nohighlight">\(\beta^{(m)}\in \mathbb{R^+}\)</span>是每轮模型的加权系数，<span class="math notranslate nohighlight">\(\mathbf{b}^{(m)}(\mathbf{x}) \in\mathbb{R}^K\)</span>是基模型<span class="math notranslate nohighlight">\(G\)</span>输出类别的标签向量。设样本的标签类别为<span class="math notranslate nohighlight">\(k\)</span>，当基模型预测的样本类别结果为<span class="math notranslate nohighlight">\(k'\)</span>时，记</p>
<div class="math notranslate nohighlight">
\[\begin{split}
b^{(m)}_{k'}=\left\{
\begin{aligned}
&amp;1,\quad &amp;{\rm if}\ k'=k\\
&amp;-\frac{1}{k-1},\quad &amp;{\rm if}\ k'\neq k
\end{aligned}
\right.
\end{split}\]</div>
<p>对于第<span class="math notranslate nohighlight">\(m\)</span>轮迭代而言，上一轮的模型输出为<span class="math notranslate nohighlight">\(\mathbf{f}^{(m-1)}(\mathbf{x})\)</span>，本轮需要优化得到的<span class="math notranslate nohighlight">\(\beta^{*(m)}\)</span>和<span class="math notranslate nohighlight">\(\mathbf{b}^{*(m)}\)</span>满足</p>
<div class="math notranslate nohighlight">
\[
(\beta^{*(m)}, \mathbf{b}^{*(m)})=\mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n L(\mathbf{y}_i, \mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{(m)}\mathbf{b}^{(m)}(\mathbf{x}_i))
\]</div>
<p>由于<span class="math notranslate nohighlight">\(\mathbf{f}^{(m-1)}(\mathbf{x}_i)\)</span>在第<span class="math notranslate nohighlight">\(m\)</span>轮为常数，记</p>
<div class="math notranslate nohighlight">
\[
w_i=\exp(-\frac{1}{K}\mathbf{y}_i^T\mathbf{f}^{(m-1)}(\mathbf{x}_i))
\]</div>
<p>此时有</p>
<div class="math notranslate nohighlight">
\[
(\beta^{*(m)}, \mathbf{b}^{*(m)})=\mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i))
\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】左侧公式的第二个等号是由于当样本分类正确时，<span class="math notranslate nohighlight">\(\mathbf{y}^T\mathbf{b}=\frac{K}{K-1}\)</span>，当样本分类错误时，<span class="math notranslate nohighlight">\(\mathbf{y}^T\mathbf{b}=-\frac{K}{(K-1)^2}\)</span>，请说明原因。</p>
</div>
<p>设当轮预测正确的样本索引集合为<span class="math notranslate nohighlight">\(T\)</span>，则损失可表示为</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
\tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})&amp;=\sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \\
&amp;= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}]+\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\
&amp;= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}] \\
&amp;\qquad - \sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}]\\
&amp;=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}}
\end{aligned}
\end{split}\]</div>
<p>注意到<span class="math notranslate nohighlight">\(\mathbf{b}^{(m)}\)</span>仅与<span class="math notranslate nohighlight">\(\sum_{i=1}w_i\mathbb{I}_{\{i\notin T\}}\)</span>有关（因为基学习器的好坏控制了样本是否能够正确预测），且此项前的系数非负（因为<span class="math notranslate nohighlight">\(\beta^{(m)}\)</span>非负），因此得到</p>
<div class="math notranslate nohighlight">
\[
\mathbf{b}^{*(m)}=\mathop{\arg\min}_{\mathbf{b}^{(m)}}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}}
\]</div>
<p>在得到<span class="math notranslate nohighlight">\(\mathbf{b}^{*(m)}\)</span>后，通过求<span class="math notranslate nohighlight">\(\tilde{L}\)</span>关于<span class="math notranslate nohighlight">\(\beta^{(m)}\)</span>的导数并令之为0可解得</p>
<div class="math notranslate nohighlight">
\[
\beta^{*(m)}=\frac{(K-1)^2}{K}[\log\frac{1-err^{(m)}}{err^{(m)}}+\log(K-1)]
\]</div>
<p>其中，样本的加权错误率为</p>
<div class="math notranslate nohighlight">
\[
err^{(m)}=\sum_{i=1}^n\frac{w_i}{\sum_{i=1}^nw_i}\mathbb{I}_{\{i\notin T\}}
\]</div>
<p>样本<span class="math notranslate nohighlight">\(\mathbf{x}_i\)</span>在第<span class="math notranslate nohighlight">\(m\)</span>轮的预测类别为<span class="math notranslate nohighlight">\(k_i^*=\mathop{\arg\max}_{k} \mathbf{f}^{(m)}(\mathbf{x}_i)\)</span>，其中</p>
<div class="math notranslate nohighlight">
\[
\mathbf{f}^{(m)}(\mathbf{x}_i)=\mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{*(m)}\mathbf{b}^{*(m)}(\mathbf{x}_i)
\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】对公式进行化简，写出<span class="math notranslate nohighlight">\(K=2\)</span>时的SAMME算法流程，并与李航《统计学习方法》一书中所述的Adaboost二分类算法对比是否一致。</p>
</div>
<p>将上述算法过程总结伪代码如下：</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../_images/ada_algo1.png"><img alt="../_images/ada_algo1.png" src="../_images/ada_algo1.png" style="width: 440px;" /></a>
</div>
<p>事实上，我们还能通过一些多分类的性质来改写算法的局部实现，使得一些变量前的系数得到简化。记</p>
<div class="math notranslate nohighlight">
\[
\alpha^{*(m)}=\log\frac{1-err^{(m)}}{err^{(m)}}+\log(K-1)
\]</div>
<p>此时，<span class="math notranslate nohighlight">\(w_i\)</span>每轮会被更新为</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
w^{new}_i&amp;=w_i\exp(-\frac{1}{K}\beta^{*(m)}\mathbf{y}_i^T\mathbf{b}^{*(m)}(\mathbf{x}_i))\\
&amp;=w_i\exp(-\frac{(K-1)^2}{K^2}\alpha^{*(m)}\mathbf{y}_i^T\mathbf{b}^{*(m)}(\mathbf{x}_i))
\end{aligned}
\end{split}\]</div>
<p>当样本分类正确时，<span class="math notranslate nohighlight">\(\mathbf{y}_i^T\mathbf{b}^{*(m)}(\mathbf{x}_i)=\frac{K}{K-1}\)</span>，即</p>
<div class="math notranslate nohighlight">
\[
w^{new}_i=w_i\cdot\exp[\frac{1-K}{K}\alpha^{*(m)}]
\]</div>
<p>当样本分类错误时，<span class="math notranslate nohighlight">\(\mathbf{y}_i^T\mathbf{b}^{*(m)}(\mathbf{x}_i)=-\frac{K}{(K-1)^2}\)</span>，即</p>
<div class="math notranslate nohighlight">
\[
w^{new}_i=w_i\cdot\exp[\frac{1}{K}\alpha^{*(m)}]
\]</div>
<p>从而可以利用示性函数<span class="math notranslate nohighlight">\(\mathbb{1}_{\{i\notin T\}}\)</span>来统一表示<span class="math notranslate nohighlight">\(w_i\)</span>更新的两类结果：</p>
<div class="math notranslate nohighlight">
\[
w^{new}_i=w_i\cdot\exp[\frac{1-K}{K}\alpha^{*(m)}]\exp(\alpha^{*(m)}\mathbb{1}_{\{i\notin T\}})
\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】在sklearn源码中找出算法流程中每一行对应的处理代码。</p>
</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】算法2第12行中给出了<span class="math notranslate nohighlight">\(\mathbf{f}\)</span>输出的迭代方案，但在sklearn包的实现中使用了<span class="math notranslate nohighlight">\(\mathbb{I}_{\{G^*(\mathbf{x})=S(\mathbf{y})\}}\)</span>来代替<span class="math notranslate nohighlight">\(\mathbf{b}^{*(m)}(\mathbf{x})\)</span>。请根据本文的实现，对sklearn包的源码进行修改并构造一个例子来比较它们的输出是否会不同。（提示：修改AdaboostClassifier类中的decision_function函数）</p>
</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】请解释将<span class="math notranslate nohighlight">\(\beta^{*(m)}\)</span>替换为<span class="math notranslate nohighlight">\(\alpha^{*(m)}\)</span>不会改变输出类别的原因。</p>
</div>
<p>对<span class="math notranslate nohighlight">\(\mathbf{w}\)</span>进行归一化操作后，不会对下一轮算法1中<span class="math notranslate nohighlight">\(G^*\)</span>和<span class="math notranslate nohighlight">\(err^{(m)}\)</span>的结果产生任何影响。同时，如果把算法1第12行的<span class="math notranslate nohighlight">\(\beta^{*(m)}\)</span>替换为<span class="math notranslate nohighlight">\(\alpha^{*(m)}\)</span>，最后的预测结果（即取<span class="math notranslate nohighlight">\(\arg\max\)</span>产生的类别）也不会产生任何变化。</p>
<p>由于<span class="math notranslate nohighlight">\(\exp[\frac{1-K}{K}\alpha^{*(m)}]\)</span>是样本公共项，故我们可以每次都利用</p>
<div class="math notranslate nohighlight">
\[
\tilde{w}_i = w_i\cdot\exp(\alpha^{*(m)}\mathbb{1}_{\{i\notin T\}})
\]</div>
<p>来更新，而不影响归一化结果。此时，算法1的迭代循环可进行如下重写：</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../_images/ada_algo2.png"><img alt="../_images/ada_algo2.png" src="../_images/ada_algo2.png" style="width: 380px;" /></a>
</div>
</div>
<div class="section" id="samme-r">
<h2>4. SAMME.R<a class="headerlink" href="#samme-r" title="永久链接至标题">¶</a></h2>
<p>许多分类器都能够输出预测样本所属某一类别的概率，但是SAMME算法只能利用分类的标签信息，而不能利用这样的概率信息。SAMME.R算法通过损失近似的思想，将加权分类模型的概率输出信息与boosting方法相结合。SAMME.R中的字母“R”代表“Real”，意味着模型每轮迭代的输出为实数。</p>
<p>不同于SAMME在第<span class="math notranslate nohighlight">\(m\)</span>轮需要同时考虑得到最优的<span class="math notranslate nohighlight">\(\beta^{(m)}\)</span>和<span class="math notranslate nohighlight">\(\mathbf{b}^{(m)}\)</span>，SAMME.R将其统一为<span class="math notranslate nohighlight">\(\mathbf{h}^{(m)}\in \mathbb{R}^K\)</span>，它需要满足对称约束条件<span class="math notranslate nohighlight">\(\sum_{i=1}^K h_k=0\)</span>以保证可估性。此时，损失函数为</p>
<div class="math notranslate nohighlight">
\[
L(\mathbf{h}^{(m)})=\exp[-\frac{1}{K}\mathbf{y}^T(\mathbf{f}^{(m-1)}(\mathbf{x})+\mathbf{h}^{(m)}(\mathbf{x}))]
\]</div>
<p>为了与概率联系，我们需对损失<span class="math notranslate nohighlight">\(L\)</span>的后验概率进行最小化，即</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
\mathbf{h}^{*(m)}&amp;=\mathop{\arg\min}_{\mathbf{h}^{(m)}} \mathbb{E}_{\mathbf{y}\vert\mathbf{x}} [L\vert \mathbf{x}]\\
&amp;=\mathop{\arg\min}_{\mathbf{h}^{(m)}} \mathbb{E}_{\mathbf{y}\vert\mathbf{x}}[ \exp[-\frac{1}{K}\mathbf{y}^T(\mathbf{f}^{(m-1)}(\mathbf{x})+\mathbf{h}^{(m)}(\mathbf{x}))]\vert \mathbf{x}]
\end{aligned}
\end{split}\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】请说明左式第三个等号为何成立。</p>
</div>
<p>设样本<span class="math notranslate nohighlight">\(\mathbf{y}\)</span>对应的标签为<span class="math notranslate nohighlight">\(S(\mathbf{y})\)</span>，则</p>
<div class="math notranslate nohighlight">
\[\begin{split}
\begin{aligned}
\mathbb{E}_{\mathbf{y}\vert\mathbf{x}} [L\vert \mathbf{x}] &amp;= \mathbb{E}_{\mathbf{y}\vert\mathbf{x}}[ \exp[-\frac{1}{K}\mathbf{y}^T\mathbf{f}^{(m-1)}(\mathbf{x})]
\exp[-\frac{1}{K}\mathbf{y}^T\mathbf{h}^{(m)}(\mathbf{x})]]\vert \mathbf{x}]\\
&amp;=\sum_{k=1}^K\left. [\exp[-\frac{1}{K}\mathbf{y}^T\mathbf{f}^{(m-1)}(\mathbf{x})]\exp[-\frac{1}{K}\mathbf{y}^T\mathbf{h}^{(m)}(\mathbf{x})]]\right|_{S(\mathbf{y})=k}P(S(\mathbf{y})=k\vert \mathbf{x})\\
&amp;=\sum_{k=1}^K \left.[\exp[-\frac{1}{K}\mathbf{y}^T\mathbf{f}^{(m-1)}(\mathbf{x})]\right|_{S(\mathbf{y})=k}P(S(\mathbf{y})=k\vert \mathbf{x})]\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1})
\end{aligned}
\end{split}\]</div>
<p>记<span class="math notranslate nohighlight">\(w=\exp[-\frac{1}{K}\mathbf{y}^T\mathbf{f}^{(m-1)}(\mathbf{x})]\)</span>，则</p>
<div class="math notranslate nohighlight">
\[
\mathbb{E}_{\mathbf{y}\vert\mathbf{x}} [L\vert \mathbf{x}] = \sum_{k=1}^K \left.w\right|_{S(\mathbf{y})=k}\cdot P(S(\mathbf{y})=k)\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1})
\]</div>
<p>不难发现对于样本<span class="math notranslate nohighlight">\(\mathbf{y}\)</span>而言，越大的<span class="math notranslate nohighlight">\(w\)</span>意味着上一轮的模型结果越糟糕，此时负责预测<span class="math notranslate nohighlight">\(P(S(\mathbf{y})=k)\)</span>的基模型就要加大对该样本的重视程度以获得较小的损失。</p>
<p>但是，此时基模型本身是不带权重的，SAMME.R采用的近似方法是，考虑以<span class="math notranslate nohighlight">\(w\)</span>为权重的基模型<span class="math notranslate nohighlight">\(G\)</span>，用其输出<span class="math notranslate nohighlight">\(P_w(s(\mathbf{y})=k\vert \mathbf{x})\)</span>的概率值来代替<span class="math notranslate nohighlight">\(\left.w\right|_{S(\mathbf{y})=k}\cdot P(S(\mathbf{y})=k\vert \mathbf{x})\)</span>，这种行为合法的原因在于权重对于总体损失的惩罚方向是一致的，<span class="math notranslate nohighlight">\(G\)</span>通过权重<span class="math notranslate nohighlight">\(w\)</span>将原本作用于<span class="math notranslate nohighlight">\(L\)</span>的损失近似地“分配”给了基分类器的损失。</p>
<p>此时，损失函数近似为</p>
<div class="math notranslate nohighlight">
\[
\mathbb{E}_{\mathbf{y}\vert\mathbf{x}} [L\vert \mathbf{x}] = \sum_{k=1}^K P_w(s(\mathbf{y})=k\vert \mathbf{x})\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1})
\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】验证<span class="math notranslate nohighlight">\(h^*_{k'}\)</span>的求解结果。</p>
</div>
<p>由对称约束条件，结合拉格朗日乘子法可得</p>
<div class="math notranslate nohighlight">
\[
h^{*(m)}_{k'}=(K-1)[\log P_w(S(\mathbf{y})=k'\vert \mathbf{x})-\frac{1}{K}\sum_{k=1}^K\log P(S(\mathbf{y})=k\vert \mathbf{x})]
\]</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】算法3的第14行给出了<span class="math notranslate nohighlight">\(w_i\)</span>的更新策略，请说明其合理性。</p>
</div>
<p>将上述算法过程总结伪代码如下：</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../_images/ada_algo3.png"><img alt="../_images/ada_algo3.png" src="../_images/ada_algo3.png" style="width: 580px;" /></a>
</div>
</div>
<div class="section" id="adaboost-r2">
<h2>5. Adaboost.R2<a class="headerlink" href="#adaboost-r2" title="永久链接至标题">¶</a></h2>
<p>利用权重重分配的思想，Adaboost还可以应用于处理回归问题。其中，Adaboost.R2算法是一种最常使用的实现。</p>
<p>设训练集特征和目标分别为<span class="math notranslate nohighlight">\(\mathbf{X}=(\mathbf{x}_1, ..., \mathbf{x}_n)\)</span>和<span class="math notranslate nohighlight">\(\mathbf{y}=(y_1,...,y_n)\)</span>，权重<span class="math notranslate nohighlight">\(\mathbf{w}\)</span>初始化为<span class="math notranslate nohighlight">\((w_1,...,w_n)\)</span>。在第<span class="math notranslate nohighlight">\(m\)</span>轮时，根据权重训练基预测器得到<span class="math notranslate nohighlight">\(G^*\)</span>，计算每个样本的相对误差</p>
<div class="math notranslate nohighlight">
\[
e_{i}=\frac{\vert y_i-G^*(\mathbf{x}_i)\vert}{\max_i \vert y_i-G^*(\mathbf{x}_i)\vert}
\]</div>
<p>设样本的加权相对误差率为<span class="math notranslate nohighlight">\(E^{(m)}=\sum_{i=1}^n w_ie_i\)</span>，则相对误差率与正确率的比值为<span class="math notranslate nohighlight">\(\beta^{(m)}=\frac{E^{(m)}}{1-E^{(m)}}\)</span>，即预测器权重<span class="math notranslate nohighlight">\(\alpha^{(m)}=\log \frac{1}{\beta^{(m)}}\)</span>。</p>
<p>更新权重<span class="math notranslate nohighlight">\(w_i\)</span>为<span class="math notranslate nohighlight">\(w_{i}[\beta^{(m)}]^{1-e_{i}}\)</span>，权重在归一化后进入下一轮训练，由此可如下写出训练算法：</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../_images/ada_algo4.png"><img alt="../_images/ada_algo4.png" src="../_images/ada_algo4.png" style="width: 380px;" /></a>
</div>
<div class="margin sidebar">
<p class="sidebar-title"></p>
<p>【练习】请结合加权中位数的定义解决以下问题：</p>
<ul class="simple">
<li><p>当满足什么条件时，Adaboost.R2的输出结果恰为每个基预测器输出值的中位数？</p></li>
<li><p>Adaboost.R2模型对测试样本的预测输出值是否一定会属于<span class="math notranslate nohighlight">\(M\)</span>个分类器中的一个输出结果？若是请说明理由，若不一定请给出反例。</p></li>
<li><p>设<span class="math notranslate nohighlight">\(k\in \{y_1,...,y_M\}\)</span>，记<span class="math notranslate nohighlight">\(k\)</span>两侧（即大于或小于<span class="math notranslate nohighlight">\(k\)</span>）的样本集合对应的权重集合为<span class="math notranslate nohighlight">\(W^+\)</span>和<span class="math notranslate nohighlight">\(W^-\)</span>，证明使这两个集合元素之和差值最小的<span class="math notranslate nohighlight">\(k\)</span>就是Adaboost.R2输出的<span class="math notranslate nohighlight">\(y\)</span>。</p></li>
<li><p>相对于普通中位数，加权中位数的输出结果鲁棒性更强，请结合公式说明理由。</p></li>
</ul>
</div>
<p>在预测阶段，Adaboost.R2使用的是加权中位数算法。设每个基模型对某一个新测试样本的预测输出为<span class="math notranslate nohighlight">\(y_1,...,y_M\)</span>，基模型对应的预测器权重为<span class="math notranslate nohighlight">\(\alpha^{(1)},...,\alpha^{(M)}\)</span>，则Adaboost.R2的输出值为</p>
<div class="math notranslate nohighlight">
\[
y=\inf \{ y\big| \sum_{m\in \{m\vert y_m\leq y\}}\alpha^{(m)} \geq 0.5 \sum_{m=1}^M\alpha^{(m)}\}
\]</div>
</div>
<div class="section" id="id2">
<h2>知识回顾<a class="headerlink" href="#id2" title="永久链接至标题">¶</a></h2>
<ol class="simple">
<li><p>二分类问题下，Adaboost算法如何调节样本的权重？</p></li>
<li><p>样本A在当轮分类错误，且样本B在当轮分类正确，请问在权重调整后，样本A的权重一定大于样本B吗？</p></li>
<li><p>在处理分类问题时，Adaboost的损失函数是什么？请叙述其设计的合理性。</p></li>
<li><p>Adaboost如何处理回归问题？</p></li>
<li><p>用已训练的Adaboost分类模型和回归模型来预测新样本的标签，请分别具体描述样本从输入到标签输出的流程。</p></li>
<li><p>观看周志华老师的讲座视频<a class="reference external" href="https://www.bilibili.com/video/BV1Cs411c7Zt">《Boosting 25年》</a>并谈谈体会。</p></li>
</ol>
</div>
</div>


              </div>
              
        
        <div class='prev-next-bottom'>
            
    <a class='left-prev' id="prev-link" href="02_ensemble.html" title="previous page">Part B: 集成模式</a>
    <a class='right-next' id="next-link" href="04_gbdt.html" title="next page">Part D: 梯度提升树</a>

        </div>
        
        </div>
    </div>
    <footer class="footer mt-5 mt-md-0">
    <div class="container">
      <p>
        
          By GYH<br/>
        
            &copy; Copyright 2021, GYH.<br/>
      </p>
    </div>
  </footer>
</main>


      </div>
    </div>
  
  <script src="../_static/js/index.1c5a1a01449ed65a7b51.js"></script>

  
  </body>
</html>